Thank you. Hello everyone.
Thanks for this nice introduction.
My name is Sebastian and I will talk about
exclusive file system for PowerSuit users
with BGFS and NVMe network storage.
Goal of this talk is to give you an overview of opportunities,
how you can isolate certain I.O. workloads from your parallel file system.
Some of them solutions I will present are already used at ZRH.
Others are evaluated at the moment to plan their deployment
and some of the work is more or less research.
So for the agenda today, I will start with a short motivation,
why we want to isolate I.O. from the parallel file system.
Then I will talk about how we can use BGFS as a project local file system solution.
And then we will look at some other technologies such as NVMe over fabrics.
What can we do with this and how is the performance of this?
And finally, we will talk about the talk file systems,
which are kind of recent topic in file system research.
And afterwards, I will sum up with a conclusion.
So let's start.
In general, when you're looking at parallel I.O., it's really global.
It touches a lot of components in a typical HPC cluster.
We had on the left side, we have this schematic picture
where we see the compute nodes at the bottom.
And at the top, there are nodes which show the parallel file system.
And in between, there's a network fabric and the compute nodes
connect somehow to the network fabric and the parallel file system also do.
And to talk with each other, they have go through this network
and everything should be fine.
In reality, on an HPC cluster, there's not one application that's running at the time.
There's a lot of mixed different workloads running at the same time
on different compute nodes.
For compute power, they can run exclusively on compute nodes,
but the parallel file system is always a shared resource.
And even the network in between is also a shared resource.
And in the little graph at the right-hand side, we see at the top two graphs,
we see the last bandwidth and metadata performance of our drop monitoring tool,
PICA.
And the graph at the bottom is the InfiniBand bandwidth.
And even if we cannot read numbers correctly because it's small,
you can see that the correlation of the file system performance
and the network is really similar.
OK, this means all the jobs that running on a cluster using the same file system
and the shared network.
So that can provide some problems.
So it can happen that some users or some jobs stress the parallel file system
in a manner that all other users have to live with that
and get also lower performance or suffer from higher latencies.
So for example, we have here some graphs from our server side monitoring,
and we see that we have large, large times, a large latency times for an LS
on our shared cluster file system.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:37:01 Min
Aufnahmedatum
2021-07-20
Hochgeladen am
2021-07-23 11:26:04
Sprache
en-US